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1. Introduction 


The conflict between Ukraine and Russia is changing Europe, which is facing a crisis destined 
to reshape the internal and external relations of the continent, shifting international balances. As 
the war in Ukraine continues, Russian propaganda about the conflict evolves. Modern propaganda 
can be seen as an attempt to influence opinion through the communication of ideas and values of a 
specific persuasive purpose (Abd Kadir et al. 2014). 

A political organisation would want to convince people to concur with the message presented 
and accept it as their own beliefs, rejecting other point of view. It has been argued that practically 
all governments use forms of propaganda to bolster their support from other nations and citizenry 
(Pratkanis and Aronson, 1991). 

For this reason, we present the preliminary results of an analysis conducted on the content of 
online newspapers used as propaganda tools by the Russian government. The selected newspapers 
create and amplify the narrative of the conflict, conveying information filtered by the Kremlin to 
advance Putin's campaign on the war. The goal of the work, therefore, is to understand the 
communication strategies that the Russian press used to motivate and justify the conflict in 
Ukraine and what types of information are disseminated by the selected newspapers. In this 
regard, through a Symmetric Non-Negative Matrix reduction factorization technique (symNMF), 
we extracted the main themes found in Russian newspaper articles to identify the topics used for 
propaganda. 


2. Non-negative matrix factorization 


Non-negative matrix factorization (NMF) is a dimension reduction method to uncover latent 
low-dimensional structures in high-dimensional data (Kim and Park, 2008). NMF is an 
unsupervised approach in that the low-rank factor matrices are constrained to have only 
nonnegative elements (Kuang et al. 2015). So, the basis vectors of the matrix are represented as a 
linear combination of vectors with positive coefficients. 

Nonnegativity improves interpretations of the information extracted from a given data matrix, 
allowing a better understanding of the results obtained from the analysis process. This is in 
contrast to dimensionality reduction techniques that rely on the singular value decomposition 
(SVD) method, such as principal component analysis (PCA). One of the major problems with 
PCA is that the basis vectors have positive and negative components, and the data are represented 
as a linear combination of these vectors with positive and negative coefficients (Pauca et al. 
2004). This is because the principal components are orthogonal, implying the presence of some 
negative values. Factors obtained from the NMF, on the other hand, are positive vectors and better 
approximate the data, but are not necessarily orthogonal (Casalino et al. 2016). 

Given a X matrix of size m X n, the decomposition of X into a matrix W of size m x k (called 
the base matrix) and a matrix H of size k x n (called the encoding matrix), such that their product 
approximates the matrix X: 


X= WH (1.1) 
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where W e H are both non-negative matrices. The product WH is an approximate 
factorization of rank at most k. Generally, the k rank of the two matrices W and H is assumed to 
satisfy that k «min {m,n} (Gaujoux and Seoighe, 2010). The value of the parameter k identifies 
the numbers of factors to be used to explain data (Casalino et al. 2016). Matrix multiplication can 
be implemented as computing the column vectors of X as linear combinations of the column 
vectors in W using coefficients given by columns of H. Each column of X can be computed as 
follows: 


xi= Wh; (1.2) 


where x; is the column vector of the product matrix X and h; is the vector of the matrix H. 

Suppose we have n data points represented as the columns in X = /X7, ..., Xn], and try to 
group them into k clusters. When W and H are subject to nonnegativity, it is possible to interpret 
the dimension reduction in (1.2) as clustering results: the columns of the first factor W provide the 
basis of latent k-dimensional space, and the columns of the second factor provide H the 
representation of x),..., Xn in the latent space. So, the cluster assignment for each data point is 
made by choosing the largest item in the corresponding column of H (Kuang et al. 2015). 

The matrices W and H are found by solving an optimization problem defined with the 
Frobenius norm (a distance measure between two given matrices), the Kullback-Leibler (KL) 
divergence (a distance measure between two probability distributions), or other divergences. 

The usual approach to NMF is to approximate X by calculating W and H to minimize the 
Frobenius norm of the X - WH difference, such that (Pauca et al. 2004): 


n 


X X Qu- WH? = IX — WHEE 
Gf (13) 


The formulation in (1.3) has been applied to many clustering tasks in which the n data points 
are available in X and are used as an input. The relationship between the data points is represented 
as a graph, where each node corresponds to a data point and a similarity matrix Anxn contains the 
similarity values between each pair of nodes (Moutier et al. 2021). The NMF is not a general 
clustering method that performs well in every circumstance, where the limitation can be attributed 
to its assumption on the cluster structure (Kuang et al. 2015). As we know, the goal is to 
approximate the original data matrix using a linear combination of basis vectors. When the 
underlying k clusters have nonlinear structure, NMF cannot find any k basis vectors that represent 
the clusters respectively. 

So, it is used the SymNMF, the symmetric variant of the NMF, that handles symmetric 
matrices A as input. This method is based on a similarity measure between data points and 
factorizes a symmetric matrix containing pairwise similarity values into the product of a 
nonnegative matrix and its transpose (Jia et al. 2021). The factorization of A will generate a 
cluster assignment matrix that is nonnegative and captures the cluster structure inherent in the 
graph representation. Given an n x n symmetric matrix A and a reduced rank k, SNMF seeks to 
find the best factorization so that: 


AHH" (1.4) 


where H can be viewed as the cluster indicator and HT the transpose matrix. 

Compared with NMF, SymNMEF concerns only the factorized similarity matrix A and doesn’t 
consider whether the structure of the data is linear or non-linear. It can be regarded as a graph 
clustering method, and it is more effective for nonlinearly separable data than NMF (Kuang et 
al.2015). It has demonstrated to be a powerful method for data clustering (Jia et al. 2021), for 
learning topics in text mining (Yan et al.2013). 
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Also, SymNMF is related to spectral clustering, SC, and both share the same loss function 
only with different constraints (Ng et al. 2001) and it can directly generate the clustering indicator 
without post-processing, while SC needs extra post-processing, like K-means, to finalize 
clustering. 


3. Methodology 


The proposed work shows preliminary results. Specifically, the analysis is carried out from 
March 2021, when the Russian military moved weapons and equipment into Crimea, to the end of 
March 2022, the day of the first negotiations in Istanbul. The selection of the newspapers is based 
on past study: the report “Pillars of Russia’s Disinformation and Propaganda Ecosystem” 
produced by U.S. Department of State. According to the report, the journals cover various 
geographies, and they have their own target audiences. These newspapers are influenced by the 
Russian government and institutions, thus highlighting a Kremlin-driven information and regime 
interpretations given to the facts of war. The papers chosen are as follows: 

e Strategic Culture Foundation: it is an online journal directed by Russia’s Foreign 
Intelligence Service (SVR) and closely affiliated with the Russian Ministry of Foreign 
Affairs. 

e Global research: a Canadian website that has become deeply enmeshed in Russia’s 
broader propaganda ecosystem. 

e News Front: it is a Crimea-based disinformation with the goal of providing an “alternative 
source of information” for Western audiences. 

e South Front: it is an online information site registered in Russia that focuses on military 
and security issues. 

e Katehon: a journal that plays the role of a provider of material aimed largely at a European 
audience, with content devoted to the "creation and defence of a secure, democratic and 
just international system." 

e Geopolitics: a platform for Russian ultra-nationalists to spread disinformation and 
propaganda targeting Western and other audiences. 

We extracted 3,396 newspaper articles, and two of them were withdrawn because they were 
not written in English; so, we had 3,394 articles. As we know, textual data are unstructured, so it’s 
necessary to perform some phases of pre-processing for having structured data. There are 
different steps: 

1. Normalized the text, so convert all the letters of the texts into a lower case; 

2. Tokenized the documents, obtaining a set of distinct strings (tokens) separated by spaces 

or punctuation marks; 

3. Removed special characters, punctuations, and numbers from the dataset. Also, hashtags, 
symbols and stopwords are eliminated; 

4. Defined a grammatical tagging, which is the process of marking a word in a text as 
corresponding to a particular part of speech. In this case, we considered the nouns, verbs, 
and adjectives. 

The pre-processing phase returned a database composed by 40.360 tokens, 5010 types and 

3394 documents. In this way, the term-document matrix indicates the number of occurrences of 
each term in the document. The dimension of term-document matrix is 5010x3394. 


4. Preliminary results 


In the final stage of the pre-treatment process we applied the documents and words matrix 
vector space model. Each word is considered a vector where each element a; represents the weight 
of that element within the individual document. In NMF, the term-document matrix is too sparse 
to estimate reliable arguments, so more stable and dense data are used. 
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According to Yan et al. (2013), for reducing the sparsity of term-document matrix, we created 
a co-occurrence matrix W composed of the vectors wi., whose elements a; represent the number 
of times each word pair <wi,wj> co-occurs within the same document. For each pair of vectors, 
we calculated the cosine measure, a similarity index that measures the similarity between two 
vectors of an inner product space. In this way we created the similarity matrix S. On this matrix 
was applied the SymNMF for identifying the main topics. According to this, we found five topics: 


Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 
vital kill basis disagreement manifestation 
threat launch arrangement development modern 
state kiev attempt dispute identity 
ultimate military background divide individual 
turn mercenary asset disagreement materialism 
step march attention economic philosopher 
urgent nearby auspice economy manifest 
threaten humanitarian attack drive individualism 
statement operation associate dream mankind 
tolerate munition benefit digital liberty 


Tab.1 — Topics extracted from SymNMF 


Tab. 1 shows the terms associated with the topics extracted from the SymNMF. It is possible 
to define the topics that the Russian media used as motivations for the Ukrainian conflict. 

The first topic excerpt refers to the threat posed by Ukraine. These newspapers present the 
conflict as a potential problem for relations with Europe and the West. There is an increasing 
urgency to seek common ground with the nation, declaring that it is responsible for all the 
casualties that are occurring. According to the Russian government, Ukraine is exaggerating the 
issue by not thinking of the collective good and making decisions that only sour the relationship 
with the Kremlin. 

In “topic 2” reference is made to the war dimension. The terms given allow identification of 
the main arguments the Russians used to justify the invasion. The war is presented as a 
“humanitarian operation” that Putin undertook to liberate Kiev and Ukraine. Russian propaganda 
aims to present the Ukrainian people, not as victims but as perpetrators of their crimes and 
murders: the term "mercenary" refers to a narrative that Ukrainian soldiers murdered Danish 
mercenaries. This serves to dispel the idea of Ukrainians as a subjugated people. All words 
referring to the dimension of war emphasize the belligerent spirit of the population, which wants 
to get rid of the Russian “nearby” enemy even with the use of atomic bombs and munitions 
received by the U.S. 

Related to this, there is “Topic 3” that highlights diplomatic-international relations. Especially, 
the topic appears to be declined in a general way but such that the dimension described can be 
interpreted. According to Russian media, President Putin has repeatedly proposed talks and 
negotiations and set out Russia's conditions, the first is that no NATO base be installed in 
Ukraine. The topic of international agreements turns out to be central as on the one hand the 
media talk about the Russian government's willingness to mediate with Ukraine, and on the other 
hand, they emphasize how the same nation wants to join NATO and improve the relationship with 
President Biden. 

“Topic 4” identifies the economic motivations of the invasion. There is a common view that 
the Ukrainian government initiated the conflict to increase its economic and geopolitical power to 
expand into neighbouring countries and counter the Russian nation. In particular, the newspapers 
report a series of events related to the Ukrainian economy: the emigration of citizens to Poland to 
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improve wage conditions, the loss of public money, and the resulting debts. The latter aspect is 
central as it predicts Ukrainian dependence on America. By this, it is narrated that European 
powers are also obliged to give money to Ukraine to pay for previous situations: for example, the 
Germans pay a debt to Ukraine for the occupation of Crimea. 

The last topic refers to a philosophical-identarian dimension, in which Western and liberal 
values are criticized and exaggerated. The thoughts of numerous philosophers who present 
liberalism in a negative light are reported, stating that it "should be opposed to fascism" but it 
imposes itself on Western civilizations, taking on the same characteristics. For this reason, the 
U.S. is presented as a nation that does not want to assert other world powers by imposing its own 
economic and social vision. In this regard, the Russian media propose a vision of its nation as one 
that engages in the pluralism of ideas and goes to represent an alternative of freedom to the 
Western world. Numerous articles are referencing how the U.S. wants to change tradition and 
classical roots (e.g., Dante's works have been called politically incorrect and have undergone 
liberal cleansing) by going on to criticize the philosophers and thinkers of the time. On the other 
hand, Russia is presented as the guardian of "true and authentic" European values and not of 
"globalization" and "liberalism’’. 


5. Conclusions 


As discussed earlier, the work is in the preliminary stage and the aim is to identify the themes 
that the Russian government used to narrate the conflict. For future developments, we are 
expanding the list of Russian information sources in order to conduct a more comprehensive 
analysis. 

Russian Federation invests its propaganda channels and its intelligence services to conduct 
activities to support their information system, and it leverages outlets on news sites or research 
institutions to spread these narratives. 

So, the Kremlin use these tactics as part of its approach to using information as a weapon. In 
this regard, the Russian government has issued a series of measures, ordering all media outlets to 
report on the invasion of Ukraine only through official state sources, blocking numerous sites for 
spreading unfounded news and threats of high treason. Russia’s willingness to employ this 
approach provides it with three advantages. First, it allows for the introduction of numerous 
variations of the narratives, to fine tune their information narratives to suit different target. 
Second, it provides plausible deniability for Kremlin officials when they peddle different 
information, allowing them to deflect criticism while still introducing damaging information. 
Lastly, it creates a media multiplier effect that boost their reach and resonance. 
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